External models (Gemini Nano Banana & OpenAI GPT Image) (#8633)#8884
External models (Gemini Nano Banana & OpenAI GPT Image) (#8633)#8884lstein merged 63 commits intoinvoke-ai:mainfrom
Conversation
|
I did some testing. Works fine (only did gemini) A few comments
|
Some Changes that should be done:In from invokeai.app.api.dependencies import ApiDependencies
record = ApiDependencies.invoker.services.model_manager.store.get_model(request.model.key)No other service in the codebase imports from In MODEL_SOURCE_TO_TYPE_MAP = {
...
ExternalModelSource: ModelSourceType.Url,
}
In for config in (runtime_config, file_config):
config.update_config(updates)
for field_name, value in updates.items():
if value is None:
config.model_fields_set.discard(field_name)This directly mutates the In In I think to save the API keys in the invoke.yaml is not the best choice here. This is something that the development team in https://discord.com/channels/1020123559063990373/1049495067846524939 should discuss. |
|
@CypherNaught-0x Please see the changes requested from @Pfannkuchensack above. |
|
Thanks @Pfannkuchensack for the valuable feedback! I didn't feel comfortable disabling all the inputs as I didn't see this done elsewhere but you are right of course in that it makes no sense to show for example CFG when that property is not used. This was more of a first draft since I wasn't sure how such a large addition would be received so I haven't yet spent much time polishing things like the install queue. I was positively surprised with the feedback so I'll try and get things to a more polished state for the next review round. How are the discussions on the API Key storage coming along? I saw that the model marketplace can store API keys there as well so figured with a decently restricted API key this might be ok though I'd obviously also prefer at least non-plain-text storage. |
|
Pfannkuchensack@3c83692 i did some work on the hiding of unneeded things in the ui. Maybe take a Look Or copy the whole Thing from there. |
|
The API keys require a separate YAML file. This is better because it allows the API key to be kept separate. |
We need a unified place to stash user's security tokens and API keys. I just now proposed a "Token Manager" in Issue #8904 . Temporarily, you could add @Pfannkuchensack Does this seem like a reasonable interim solution to API key storage or would it be better to have a completely separate API keys file, like |
|
I would prefer the separate file, especially since there will be another solution later, thus avoiding major changes to the invoke.yaml file. |
|
@CypherNaught-0x I'm wondering what you see as the timetable for this? I'm thinking we'll be ready for a 6.12 release in the second week of March. Would that be targetable, or later? The release after that will likely be mid April. |
I had already started work on this and your implementation looks very similar so I'll try and integrate them. Also very much looking forward to |
|
@lstein I've had some time to work on it. I'll try and get things into a polished state and push the changes. I believe mid march should be very much reasonable for a release target. |
…rnal graph - Export imageSizeChanged from paramsSlice (required by the new ImageSize recall handler). - Emit the external graph's metadata model entry via zModelIdentifierField since ExternalApiModelConfig is not part of the AnyModelConfig union.
|
Thanks for the fixes. I've done some functional testing with the Gemini models and encountered a few remaining hitches.
|
|
https://ai.google.dev/gemini-api/docs/image-generation?hl=en#2_inpainting_semantic_masking I take a lot for the rest |
…estrict GPT Image models to txt2img
|
I'm still uncertain that inpaint masks are usable with the external models. Observations on the OpenAI models, using the node editor:
Observations on the Gemini models, using the node editor:
If image mask-based inpainting isn't work, let's just remove the The |
- Remove img2img and inpaint modes from Gemini models (Gemini has no bitmap mask or dedicated edit API; image editing works via reference images in the UI) - Fix DALL-E 2 inpainting: convert grayscale mask to RGBA with alpha channel transparency (OpenAI expects transparent=edit area) and convert init image to RGBA when mask is present
|
@Pfannkuchensack Thanks for the recent fixes and I'm looking forward to getting this PR finished and merged. It's been a lot of work! Unfortunately I haven't been able to get inpainting working with the DALL-E2 model (which as far as I can tell is the only model that uses a mask). I assign a black and white bitmap mask, but the entire image gets modified, not just the masked region. Does inpainting work in your hands? I also have a philosophical question about whether the Gemini invocation node should even show the modes, the init image and the mask image fields. These are not supported by any Gemini model, and showing them as usable UI fields may confuse people. Similarly, please consider whether the OpenAI invocation node should show these fields, since only the old DALL-E2 model uses them. The others are edit models. |
- Remove DALL-E 2 from starter models (deprecated, shutdown May 12 2026) - Enable img2img for GPT Image 1/1.5/1-mini (supports edits endpoint) - Set Gemini models to txt2img only (no mask/edit API; editing via ref images) - Hide mode/init_image/mask_image fields on Gemini node (not usable) - Hide mask_image field on OpenAI node (no model supports inpaint)
|
The Gemini invocation node looks good and is ready to go. Major comment I suggest:
Minor comment |
|
Another thing I have noticed when using either of the external generation nodes. When I have "Use cache" and "Save to Gallery" checked in the external generation node and hit the I would have expected to get multiple identical images using the cached values rather than no image. |
- Hide OpenAI node's mode and init_image fields: OpenAI's API has no img2img/inpaint distinction (the edits endpoint is invoked automatically when reference images are provided). init_image is functionally identical to a reference image and was misleading users. - Default use_cache to False for external image generation nodes: external API calls are non-deterministic and incur usage costs. Cache hits returned stale image references that did not produce new gallery entries on repeat invokes.
External image generation nodes use the standard invocation cache, but returning the cached output (with stale image_name references) on cache hits resulted in no new gallery entries — the Invoke button would spin indefinitely on repeat invokes with identical parameters. Override invoke_internal so that on cache hit, the cached images are loaded and re-saved as new gallery entries. The expensive API call is still skipped (cost saving), but the user sees a new image as expected.
|
Almost there! I found just one more thing that I missed on the first go-rounds. In the OpenAI models, the "Remix" recall function is not restoring the advanced Quality, Background or Input Fidelity settings. |
Remix recall iterates through ImageMetadataHandlers but only Gemini's temperature handler was wired up — OpenAI's quality, background, and input_fidelity were stored in image metadata but never parsed back into the params slice. Add the three missing handlers so Remix restores these settings as expected.
Summary
This PR adds support for external model provider APIs with Google and OpenAI added for now.
It supports txt2img, img2img and image references.
I tried to make it fit well within the application and be easily extensible for future models.
Related Issues / Discussions
#8633 includes functionality requested here
QA Instructions
Select an external provider in the model setup dialog and add an API key.
Select a new model from the dropdown list.
...
Profit
Checklist
What's Newcopy (if doing a release after this PR)